293 research outputs found

    On the Origins of Coccinelle

    Get PDF

    <strong>Generic Patch Inference</strong>

    No full text

    SEWordSim: Software-Specific Word Similarity Database

    Get PDF
    International audienceMeasuring the similarity of words is important in accurately representing and comparing documents, and thus improves the results of many natural language processing (NLP) tasks. The NLP community has proposed various measurements based on WordNet, a lexical database that contains relationships between many pairs of words. Recently, a number of techniques have been proposed to address software engineering issues such as code search and fault localization that require understanding natural language documents, and a measure of word similarity could improve their results. However, WordNet only contains information about words senses in general-purpose conversation, which often differ from word senses in a software-engineering context, and the software-specific word similarity resources that have been developed rely on data sources containing only a limited range of words and word uses.In recent work, we have proposed a word similarity resource based on information collected automatically from StackOverflow. We have found that the results of this resource are given scores on a 3-point Likert scale that are over 50% higher than the results of a resource based on WordNet. In this demo paper, we review our data collection methodology and propose a Java API to make the resulting word similarity resource useful in practice.The SEWordSim database and related information can be found at http://goo.gl/BVEAs8. Demo video is available at http://goo.gl/dyNwyb

    Improving pattern tracking with a language-aware tree differencing algorithm

    Get PDF
    International audienceTracking code fragments of interest is important in monitoring a software project over multiple versions. Various approaches, including our previous work on Herodotos, exploit the notion of Longest Common Subsequence, as computed by readily available tools such as GNU Diff, to map corresponding code fragments. Nevertheless, the efficient code differencing algorithms are typically line-based or word-based, and thus do not report changes at the level of language constructs. Furthermore, they identify only additions and removals, but not the moving of a block of code from one part of a file to another. Code fragments of interest that fall within the added and removed regions of code have to be manually correlated across versions, which is tedious and error-prone. When studying a very large code base over a long time, the number of manual correlations can become an obstacle to the success of a study. In this paper, we investigate the effect of replacing the current line-based algorithm used by Herodotos by tree-matching, as provided by the algorithm of the differencing tool GumTree. In contrast to the line-based approach, the tree-based approach does not generate any manual correlations, but it incurs a high execution time. To address the problem, we propose a hybrid strategy that gives the best of both approaches

    Faults in Linux 2.6

    Get PDF
    In August 2011, Linux entered its third decade. Ten years before, Chou et al. published a study of faults found by applying a static analyzer to Linux versions 1.0 through 2.4.1. A major result of their work was that the drivers directory contained up to 7 times more of certain kinds of faults than other directories. This result inspired numerous efforts on improving the reliability of driver code. Today, Linux is used in a wider range of environments, provides a wider range of services, and has adopted a new development and release model. What has been the impact of these changes on code quality? To answer this question, we have transported Chou et al.'s experiments to all versions of Linux 2.6; released between 2003 and 2011. We find that Linux has more than doubled in size during this period, but the number of faults per line of code has been decreasing. Moreover, the fault rate of drivers is now below that of other directories, such as arch. These results can guide further development and research efforts for the decade to come. To allow updating these results as Linux evolves, we define our experimental protocol and make our checkers available

    Automating the Porting of Linux to the VirtualLogix Hypervisor using Semantic Patches

    Get PDF
    International audienceVirtualization is a promising technology for running multiple operating systems (OS’s) on a single processor. Preparing an OS for use with virtualization, however, involves making some changes to the OS code, which must be repeated for each version, whether a new release or a client customization. Typically such changes are expressed as patches, but patches are often not portable from one version to another, and thus manual adjustments are needed as well. In this paper, we consider the use of the automated transformation system Coccinelle to perform the changes required to port several versions of Linux to the VLX hypervisor. Coccinelle provides a notion of semantic patches, which are more abstract than standard patches, and thus are potentially applicable to a wider range of OS versions. We have applied this approach in the context of Linux versions 2.6.13, 2.6.14, and 2.6.15, for the ARM architecture

    Dynamic Consolidation of Highly Available Web Applications

    Get PDF
    Datacenters provide an economical and practical solution for hosting large scale n-tier Web applications. When scalability and high availability are required, each tier can be implemented as multiple replicas, which can absorb extra load and avoid a single point of failure. Realizing these benefits in practice, however, requires that replicas be assigned to datacenter nodes according to certain placement constraints. To provide the required quality of service to all of the hosted applications, the datacenter must consider of all of their specific constraints. When the constraints are not satisfied, the datacenter must quickly adjust the mappings of applications to nodes, taking all of the applications' constraints into account. This paper presents Plasma, an approach for hosting highly available Web applications, based on dynamic consolidation of virtual machines and placement constraint descriptions. The placement constraint descriptions allow the data- center administrator to describe the datacenter infrastructure and each appli- cation administrator to describe his requirements on the VM placement. Based on the descriptions, Plasma continuously optimizes the placement of the VMs in order to provide the required quality of service. Experiments on simulated configurations show that the Plasma reconfiguration algorithm is able to man- age a datacenter with up to 2000 nodes running 4000 VMs with 800 placement constraints. Real experiments on a small cluster of 8 working nodes running 3 instances of the RUBiS benchmarks with a total of 21 VMs show that con- tinuous consolidation is able to reach 85% of the load of a 21 working nodes cluster.Externaliser l'hébergement d'une application Web n-tiers virtualisée dans un centre de données est une solution économiquement viable. Lorsque l'administrateur de l'application considère les problèmes de haute disponibilité tels que le passage à l'échelle et de tolérance aux pannes, chaque machine virtuelle (VM) embarquant un tiers est répliquée plusieurs fois pour absorber la charge et éviter les points de défaillance. Dans la pratique, ces VM doivent être placées selon des contraintes de placement précises. Pour fournir une qualité de service à toutes les applications hébergées, l'administrateur du centre de données doit considérer toutes leurs contraintes. Lorsque des contraintes de placement ne sont plus satisfaites, les VM alors doivent être ré-agencées au plus vite pour retrouver un placement viable. Ce travail est complexe dans un environnement consolidé où chaque nœud peut héberger plusieurs VM. Cet article présente Plasma, un système autonome pour héberger les VM des applications Web haute-disponibilité dans un centre de données utilisant la consolidation dynamique. Par l'intermédiaire de scripts de configuration, les administrateurs des applications décrivent les contraintes de placement de leur VM tandis que l'administrateur système décrit l'infrastructure du centre de données. Grâce à ces descriptions, Plasma optimise en continu le placement des VM pour fournir la qualité de service attendue. Une évaluation avec des données simulées montre que l'algorithme de reconfiguration de Plasma permet de superviser 2000 nœuds hébergeant 4000 VM selon 800 contraintes de placement. Une évaluation sur une grappe de 8 nœuds exécutant 3 instances de l'application RUBiS sur 21 VM montre que la consolidation fournit par Plasma atteint 85% des performances d'une grappe de 21 nœuds

    An automated approach for finding variable-constant pairing bugs

    Get PDF
    Named constants are used heavily in operating systems code, both as internal flags and in interactions with devices. Decision making within an operating system thus critically depends on the correct usage of these values. Nevertheless, compilers for the languages typically used in implementing operating systems provide little support for checking the usage of named constants. This affects correctness, when a constant is used in a context where its value is meaningless, and software maintenance, when a constant has the right value for its usage context but the wrong name. We propose a hybrid program-analysis and data-mining based approach to identify the uses of named constants and to identify anomalies in these uses. We have applied our approach to a recent version of the Linux kernel and have found a number of bugs affecting both correctness and software maintenance. Many of these bugs have been validated by the Linux developers

    JMake: Dependable Compilation for Kernel Janitors

    Get PDF
    International audienceThe Linux kernel is highly configurable, and thus, in principle, any line of code can be included or excluded from the compiled kernel based on configuration operations. Configurability complicates the task of a kernel janitor, who cleans up faults across the code base. A janitor may not be familiar with the configuration options that trigger compilation of a particular code line, leading him to believe that a fix has been compile-checked when this is not the case. We propose JMake, a mutation-based tool for signaling changed lines that are not subjected to the compiler. JMake shows that for most of the 12,000 file-modifying commits between Linux v4.3 and v4.4 the configuration chosen by the kernel allyesconfig option is sufficient, once the janitor chooses the correct architecture. For most commits, this check requires only 30 seconds or less. We then characterize the situations in which changed code is not subjected to compilation in practice
    corecore